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DELPHINE BLANKE AND DENIS BOSQ 



Abstract. In this paper, we adopt a Bayesian point of view for predicting real stochastic 
processes. We give two equivalent definition of a Bayesian predictor and study some proper- 
ties: admissibility, prediction sufficiency, unbiasedness, comparison with efficient predictors. 
Prediction of Poisson process and prediction of Ornstein-Uhlenbeck process in the continu- 
ous and sampled situations are considered. Various simulations illustrate comparison with 
non-Bayesian predictors. 



1. Introduction 

A lot of papers are devoted to Bayesian estimation but Bayesian prediction does not 
appear very much in statistical literature. However, this topic is important in particular if 
the number of data is small. In this paper, we study some properties of Bayesian predictors 
and give examples of applications to prediction of stochastic processes. Various simulations 
illustrate the results. Our goal is to compare efficiency of Bayesian predictors with non- 
Bayesian predictors, especially if one has few data at his disposal. 

Section [2] presents the general prediction model ; in this context estimation appears as 
a special case of prediction. The main point of the theory is the fact that, given the data 
X, a statistical predictor of Y is an approximation of the conditional expectation Ee(V|A A ), 
where 9 is the unknown parameter. Section [3] deals with Bayesian prediction : we give two 
equivalent definitions of a Bayesian predictor linked with the equivalence of predicting Y and 
E^(y|X). However, in some situations, it is difficult to get an explicit form of the Bayesian 
estimator E(6*|X), thus it is more convenient to substitute the conditional expectation of 9 
with the conditional mode of 9, also called maximum a posteriori (MAP). We recall some 
properties of the MAP and underscore its link with the maximum likelihood estimator. In 
Section HI we study some properties of Bayesian predictors : admissibility, connection with 
sufficiency and unbiasedness, case where the conditional expectation admits a special form. 
Section [5] considers the regular case of Poisson process prediction. We compare the unbiased 
efficient predictor with the Bayes ian and the MAP ones. Concerning diffusion processes, 
Thompson and Vladimirovl ( 20051 ) study Bayesian prediction but, they obtain an intricate 



result, difficult to handle ; they don't try to compare their results with classical predictors. 
For the Ornstein-Uhlenbeck process, we deal with prediction in Section O for the centered 
and non-centered case and with various priors while Section [7] is devoted to the sampled 
case. Some asymptotic results are given along the paper, but, since the non asymptotic case 
is the most important in the Bayesian perspective, theoretical and numerical comparisons 
focus on this point. 
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2. The prediction model 

In the sequel, each space is equipped with a suitable er-algebra and each application is 
supposed to be measurable with respect to these a-algebras. Now, let (fl,A, P#, 9 G 0) be a 
statistical model, where 9 is the unknown parameter, and let (X, Y) : (Q, A) i— > (ExF, E®^) 
be a random vector. One observes X and wants to predict Y. Actually, it is convenient t o 
consider the more general problem : " predict g = g(X, Y, 9) given X " fcf lYatracosl . ll992h . 

In this paper, we suppose that g is R-valued and denote p = p(X) a statistical predictor. 
We suppose that Eg(g 2 ) < oo and Ee(p 2 ) < oo, 9 G 6. If p and q are two predictors, a 
classical preference relation is 

p-<q (g)^Mp-g) 2 <^e{q-g)\ 9ee 

where "(g)" means "for predicting g n . 

Now, let Ef(g) = Eg{g\X) be the conditional expectation of g given X. The next lemma 
is simple but important. 

Lemma 2.1. We have p -< q (g) p -< q (Ef(g)). 
Proof. It suffices to apply the Pythagoras theorem : 

E e (p(X) - gf = E e (p(X) - Ef(g)) 2 + E e (Ef(g) - gf 

and 

E e (q(X) - gf = E (q(X) - Ef(g)) 2 + Eg(^(g) - g)\ 
the result follows. □ 

Prediction theory has some similarity but also some difference with estimation theory. 
In the following, we w ill only recall some necessary definitions and results. We refer to 
Bosq and Blankel f 20071 ) . chapters 1 and 2, for a more complete exposition. 



3. Bayesian prediction 

3.1. Bayesian predictors. In a Bayesian context, one considers the random vector (X, Y, T). 
The prior distribution of T over (6, T) is defined by r and the distribution P of (X, Y, T) is 
defined as 

P(X G A, Y G B, T G C) = [ P e (X G A, Y G B) dr{9), A e £, B e 7, C G T. 

Jc 

The Bayesian risk is given by 

E(p(X)-Y) 2 = f E e (p(X)-Y) 2 dr(9) :=r(p,Y), 
Je 

hence, the Bayesian predictor po(X) = argminr(p, Y) = E(Y|X) where E(-|-) is the condi- 

v 

tional expectation associated with P. We will say that po is unique if two versions of po differ 
only on M with Vq(N) = 0, 9 G 0. Now, we consider the following regularity assumption : 

Assumption 3.1. There exists a common version m(X,9) of Eg(Y\X) , 9 G G. 
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Note that Assumption 13 . 11 is satisfied if (X, Y, T) has a strictly positive density with respect 
to the cr-finite measure A CS> \i (8> say L(x, y, 9)ip(9), L denoting the conditional density of 
(X, Y\T = 9). Then p (X) = J yf Y \x{v\X) dfi{y) where 

f Y \x(y\X) = jL(x,y,9M9) du(9) / J J L(x,y,0)<p(0) d^y)dv{9). 

In that situation p is unique. 

We now derive an alternative form of po : 

Lemma 3.1. If AssumpUon{3J\holds, thenm(X,T) = E(Y\X,T), andp (X) = E(m(X,T)\X) . 

Proof. Let h(X, 9) be a real bounded function. Then, since m(X, 9) is a version of E#(Y| X) 
for all 9, we have 

E e (jY -m(X,0))h(X,ej\ = 0, 9 e 0, 

hence 

E^(y- m{X,T))h{X,T)J = J E e ((Y -m(X,9))h(X,9)^ dr(9) = 0. 
It follows that m(X,T) = "E(Y\X, T). Now, the Bayesian risk has the decomposition 

r(Y,p(X)) = E(y -m(X,T)) 2 + r(m(X,T),p(X)), 
thus it is equivalent to minimize r(Y,p(X)^j and r(m(X, T),p(X)^j, consequently 

p (X) = E(m(X,T)\X). 



□ 



Clearly, Lemma [3. II holds if Y is replaced by g(X, Y, 9). 



3.2. The MAP predictor. An alternative method of Bayesian prediction is based on the 
conditional mode : one may compute the mode of the distribution of Y, given X, with 
respect to P. If a strictly positive density does exist, the distribution of (X, Y) has marginal 
density 

f(x,y)= [ L(x, y, 0)y{0) d9 
Je 

and, in fact, it suffices to compute argmax f(x, y) (x fixed). Another method consists in 

v 

determining the mode of T given X and to plug it in the conditional expectation E^(y|X). 
This mode (also called maximum a posteriori, MAP) has the expression 

9{x) = argmax J^n^ml ( m = argmax £(x, 9)y(9) 
J e e{x,9)p(9)dT(9) 

where £(x, 9) = J L(x, y, 9) d/i(y), hence the MAP predictor 

p(X) = E e (Y\X) _ =m(X,9) 

under the Assumption 13.11 It is noteworthy that, if O = R and one chooses the improper 
prior 1 • A, where A is Lebesgue measure, the obtained estimator is the maximum likelihood 
(MLE). Note also that, if £(x,9)ip(9) is symmetric with respect to 9(X), the MAP and the 
Bayes estimator of 9 coincide. Finally, it is clear that, under classical regularity conditions, 
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the MAP and the MLE have the same asymptotic behaviour as well almost surely as in 
distribution. Now, the MAP has some drawbacks : it is often difficult to compute and 
uniqueness is not guaranteed. We will use the MAP in Sections E] to [3 

4. Properties of Bayesian predictors 
We give below some useful properties of Bayesian predictors. 

4.1. Admissibility. 

Proposition 4.1. A Bayesian predictor is admissible as soon as it is unique. 

Proof. If po is not admissible, there exists a predictor p such that 

E e (p-g) 2 <E e (p -g) 2 , 9eQ 

Integrating with respect to r entails r(p,g) < r(p ,g), but, since po is Bayesian, it follows 
that r(p, g) = r(p , g) and uniqueness of p gives p = p (P e a.s. for all 9). □ 

4.2. ^-Sufficiency. A statistic S = S(X) is said to be ^-sufficient (or sufficient for predicting 

g) if 

(a) S is sufficient in the statistical model associated with X : there exists a version of 
the conditional distribution of X given S, say Q s , that does not depend on 9. 

(b) X and g are conditionally independent given S. 

Note that this does not imply that Eq(Y\S(X)^ is constant with respect to 9 since Y is not 
in the model associated with X (see example of the Poisson process in Section E]). If S is 
(/-sufficient, it is then possible to de rive a Rao-Blackwell theorem as well as a factorization 
theorem (cf Bosq and Blanke . 2007| ). Now, we have 



Lemma 4.1. If S is Y -sufficient and if Assumption [X71 holds, then 
(4.1) E e (X\X)=E e {Y\S(X)), 9eQ. 

Proof. Note first that m(X, 9), respectively J m(X, 9) dQ s , is a version of Eg(Y\X), resp. of 
E e (Y\S(X)). For sake of clarity, we use notation E e (Y\X) and E e (Y\S(X)). Now, 

E (Y\S(X)) = E e (Eo(Y\X)\S{xj), 
and applying (b) to Y and Ee(Y"|X) we obtain 

E e (Y ■ E e (Y\X)\S(X)) = E e (Y\S(X)) ■ E e (E e (Y\X)\S(X)) = (E e (Y\S(X)) 
Taking expectation and noting that 

E e {Y ■ E e (Y\X)) = E e (^(E e (Y\X)) r 

entails 

E e ((E,(y|A)) 2 ) =E e ((E e (Y\S(X))y 

that is \\E e (Y\X)\\l 2{Pg) = \\E e (Y\S(X)) ||^ (Pe) - Relation g3J follows since E e (Y\S(X)) is 
the projection of E^(y|A r ) on L 2 s , x y □ 
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Note that, if (X,Y) has a strictly positive density of the form L(S(x),y,9), one obtains 
fl4.ll) by a direct computation. Concerning the Bayesian predictor, we have 

Proposition 4.2. If po is unique and S is Y -sufficient, then po(X) = E(Y\S(X)) . 

Proof . Since Po(X) = E(Y\X), the Rao-Blackwell theorem for prediction (cf lBosq and Blanke . 



20071 . p. 15) entails pi(X) := E 5 ^ (E(Y \X)) -< p (X) where E 5 ^ is conditional expecta- 



tion with respect to Q s in (a). Now, from Proposition 14. H p is admissible, thus p\ = po, 
then po has the form po(X) = E(Y\X) = ^(S(X)j. Taking conditional expectation with 
respect to P, one obtains the result. □ 

4.3. Decomposition of the conditional expectation. We now consider the special case 
where the conditional expectation admits the following decomposition : 

(4.2) E 6 {Y\X) = A(X) + B(8)C(X) + D(0), 9eS 

where A, B® C, D £ L?{E x 6, £®T , P(x,t)), then the Bayesian predictor has also a special 
form : 

Proposition 4.3. If Eg(Y\X) satisfies (14 .2p . the associated Bayesian predictor is given by 

(4.3) p (X) = A(X) + E(B(T)\X) ■ C(X) + E(D(T)\X). 

In particular, if X and Y are independent and D(8) = Eg(Y), the predictor reduces to the 
estimator p (X) = E(D(T)\X). 

Proof. Relation (I4.2p entails 

m(X, T) = A(X) + B(T) ■ C(X) + D(T), 

and Lemma [3.11 gives po{X) = E(m(X, T)|X) hence (14.31) from the properties of conditional 
expectation. The last assertion is a special case of (14.31) . □ 

4.4. Unbiasedness. A predictor p of g is said to be unbiased if 

E p = E g, 6eQ. 

A Bayesian estimator is, in general, not unbiased, in fact we have the following : 

Lemma 4.2 (Blackwell-Girschick) . Let <p(X) be an unbiased Bayesian estimator of ip(9), 
then 

E(£(X) - V (T)) 2 = 
where E is expectation taken from P(x,t)- 

Proof. See Lehmann and Casellal ( 1998 ) p. 234. □ 
The situation is more intricate concerning a Bayesian predictor. Note first that, if 

(4.4) E ePo = E e g, 9eO 

then, po is an unbiased estimator of E# g but it is not necessarily a Bayesian estimator of 
Eg g. Recall that the Bayesian interpretation of (14.41) is : 

E(po|T = 0) = E(0|T = 0), 9eQ. 

Now, we have the following result : 
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Proposition 4.4. If the Bayesian risk satisfies 

(4.5) E(p (X)-m(X,T)) 2 = 
then po(X) is unbiased for predicting Y. Conversely if 

(4.6) m(X, 9) = A(X) + D(9) 
and if po(X) is unbiased then (14.51) holds. 

Proof. Relation (j4.5p implies Po(X) = m(X, T) P(x,t) a.s., that is 

E(Y\X) = E(Y|X,T) P (X)T) a.s.. 
Conditioning with respect to T gives 

E(po(X)|T) = E(r|T) 

which means that po{X) is unbiased. Conversely (14.60 and (I4.3P in Proposition 14.31 imply 

Po (X) = A(X) + E(D(T)\X). 
Now, since p is unbiased, and E(Y|X, T) = m(X, T) implies 

E(Y|T) = E(m(X,T)|T), 

one gets 

E(A(X)\T) +e(e(D(T)\X)\t) =E(Y|T) 

and 

E(Y|T) = E(A(X)\T) + E(D(T)|T) 

by (14.61) . This means that the Bayesian estimator of D(9) is also unbiased. Then Lemma I4T21 
gives 

E(p (X) - m(X,T)) 2 = E(E(D(T)|X) - D(T)) 2 = 0. 

□ 

In the more general case where E#(Y|X) has the form (14. 2 p with non-null B(8)C(X ), it is 
possi ble to find an unbiased Bayesian predictor with a non- vanishing Bayesian risk (cf iBosql . 
20121 ). 



Now, let us define a "Bayesian type" predictor by 

(4.7) po(X) = ap(X) + (l-a)m(X,8 ), (0 < a < 1, 8 e 9) 

where p(X) is an unbiased predictor of Y. For these specific predictors, our previous result 
may be extended as follows. 

Proposition 4.5. Suppose that Assumption \3 . 1\ holds and consider a predictor p^iX) of the 
form (14 .7p . Then, if p (X) is unbiased, it follows that 

(4.8) E, (m(X, 9)) = E e (m(X, 9 )) , 9 e 0, 

if, in addition, there exists a Y -sufficient complete statistic then m(X,9) = m{X 1 9o), 9 £ G 
and the problem of prediction is degenerated. 
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Proof. If po(X) is an unbiased predictor of Y, one has 

E e (p (X)) = E e (m(X,9)), 9 e 0, 
and taking expectation in (14. 7[) yields 

E fl (m(X,0)) = aE fl (p(X)) + (l-a)E e (m(X,0 o )) 

hence, since p(X) is unbiased, (14. 8 p follows. Now, if S(X) is a ^-sufficient statistic, Lemma H~Tl 
entails m(X,8) = E e (Y\S(X)), thus, (HSJ) implies 

E e (E,(F|5(X)) - E 9o (y|5(X))) =0, e 0, 

and, since S'(X) is complete, one obtains the last result. □ 

4.5. Comparing predictors. The following elementary lemma allows to compare Bayesian 
predictors with the classical unbiased predictor. We will use it in the next sections. 

Lemma 4.3. Suppose that 

m(X,9) = A(X) + d-9 (d^O) 
and let p(X) be an unbiased predictor of Y taking the form 

p(X) = A(X) + d-9(X). 
Consider the "Bayesian type" predictor 

p (X) = ap(X) + (1 - a)m(X,9 Q ) 
where a e]0, 1[ and 9 e 0. Then 



(4.9) po -< p <=► |0 - O I < (^)" • (E«(0(X) - ^) 

Proof. We have 

po W - m(X, 0) = a(p(X) - m(X, 0)) + (1 - a) (m(X, 9 ) - m(X, 9)) 
then, since p is unbiased, 

E e (p (X) - m(X,9)) 2 = a 2 E e (p(X) - m(X, 9)f + (1 - a) 2 d 2 (9 - 9) 2 

thus 

Po -< p <=> d 2 (l - a) 2 {9 - 9 ) 2 + a 2 E e (p(X) - m(X, 9)) 2 < E (p(X) - m(X, 9)) 2 
and (EH} follows. □ 
Remarks : If X = X( n ) = (Xi, . . . , X n ) and 

E e (9(X)-9) 2 = - 

then the condition is 

If one may find a = a n such that 

mf ( 1 -±^y.(vy>b>o, 



2\ 2 
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it follows that \9 — 9q\ < b implies po(X^) -< p(Xr n yf for all n > 1. Moreover, the choice 
A(X) = in Lemma 14.31 provides an alternative formulation for comparing Bayesian esti- 
mators of 9 versus non Bayesian ones. 

5. Application to Poisson process 

5.1. The Bayesian predictor. Let (N t , t > 0) be an homogeneous Poisson process with 
intensity 9 > 0, X = (N t , < t < S) is observed and one wants to predict Y = N s+h 
(h > 0), (S > 0). Lemma I2TT1 shows that it is equivalent to pred ict m(X, 9) = 9h + N$. The 
unbiased efficient predictor is obtained by replacing 9 with -# ( Bosq and Blanket 120071 ) for 
obtaining p(N s ) = ^N s =: N s + 9 s h. 

Concerning the Bayesian predictor, a classical prior is r = T(a, b) with density 

^9 a - 1 exp(-W)t ]0>+oo[ (9), (a>0,b> 0). 

r(a) 

Since Ns is iV,s + /j-sufficient, Lemma [47X1 entails 

Ee(N s+h Wt, 0<t<S) = ® e (N s+h \N s ) 

and Proposition 14.21 gives 

Po(N t ,0<t<S) = E(N s+h \N s ). 
The same property holds for the Bayes estimator given by 

a + N s 



9, 



EfTliVc 



b + S ' 



and, from Proposition 14. 3[ the Bayesian predictor is 

a + N s 



Po(Ns) 

To compare po with p, note that 



9 S 



S 



b + S 



■9 S +{1 



h + N : 



S 



s- 







a 
b' 



b + S " v 6 + 
We deduce that 

Po(iVs) = asP(Ns) + (1 - a 5 )(iV s + 6> /i) 

with «5 = and # = §• Since Eg (9$ — 9) 2 = |, a straightforward consequence of 
Lemma 14.31 is 

(5.1) p ^ p ^(6-9 
Solving ( 15. ip in 6*, we get that po -< p iff 



) 2 -2«(e„ + ^ + i) + ^<o 



that is when 



with A = (9 + i + |) 



G 

2 



0O + ^ + ^ 

eg. 
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Also, from (15 .ip . a sufficient condition, holding for all S, is {9 — 9 ) 2 < —6 which gives 



9 e 



with A = ±(20 o + I), that is 



9 e 



9i,6 2 



e Q + \-^9^ + \ 

b b 



1 V2a + f a+1 + 1 



Clearly, one obtains the same result for comparing 9s with 9$- If a — 1, 9q = | then 
1 = 2^/1 an d # 2 = £b/s If & is large; q 2 _ $i is large but q algo ! 



Turning to the MAP estimator, one has to compute argmaxg L(9) which is equal to 



(9S) N * b a 



argmaxe — i - . . 9 a e 
B e N s \ Y{a) 



We have 

91 

~ 89 89 \ ~ v ~ ' ' 9 
hence 9s = Ns b ^ a s ~ 1 where we choose a > 1 for convenience, inducing the predictor : 

MNs)= Ns + a - 1 h + N 8 . 
b + S 

Replacing a with a — 1, the previous discussion about po holds and one gets, for all S, the 
sufficient condition 



„ a J2a — 1 „ a y/2a — 1 

"o^^t; — <e< 6 + ^^ 



Finally, another method consists in computing the marginal distribution of (N$, Ng+h) 
and then to determine the conditional mode of N$+h given As*. In this case, one obtains a 
similar predictor. Details are left to the reader. 

5.2. Simulations. In this section, we compare the unbiased (UP), the Bayesian (BP) and 
the MAP predictors for various Poisson processes. First, we simulate N = 10 5 homo- 
geneous Poisson processes with intensity 9 varying in {0.5,1,2,5,10}. Next, for S in 
{10,15,20,25,30,40,50,75,100} and horizon of prediction h in {0.5,1,2,5}, we compute 
an approximation of the empirical L 2 -error of prediction : 

1 N 

where stands for the j-ih replicate of the process at time t and p(Ng) is the predictor 
under consideration (Bayesian and MAP predictors are computed with a T(a, 1) distribution 
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for the prior). We will also consider the empirical L 2 -error of estimation (with respect to 
the probabilistic predictor ~Eg(Ns+h\Ns)) defined by 

1 N 

^(Njp + Oh-ftNjp))'. 

i=i 

In Table dj we give the rounded L 2 -errors of estimation according to S as well as prediction 
errors (enclosed in parentheses) for the unbiased predictor when 9 = h = 1. To help the 
comparison, only the percentage variations of BP and MAP errors (relatively to the UP 
ones) are reported for a — 1, 2,4. Namely, since 9 = 1, it is expected from (15. ip that a = 4 
represents a bad choice of prior (while a = 1 corresponds to the best one, and a = 2 is 
acceptable). From Table HJ we observe that : 

- as expected, all errors decrease as S increases ; 

- for all errors and any value of S, Bayesian and MAP predictors are better than the 
unbiased one for a = 1,2, with a clearly significant gain for small values of S in the 
estimation framework ; 

- the bad choice a = 4 clearly penalizes the predictor, with a significant impact on 
the L 2 -error of estimation. Concerning the prediction error, this effect is smaller : 
unfortunately, the probabilistic error is predominant ! 



TABLE 1. L 2 estimation (prediction) error for UP and percentage variation of L 2 
estimation (prediction) error for BP and MAP, in the case where 6 = 1 and h = 1. 





S=15 


S=20 


S=30 


UP 


0.066 (1.066) 


0.050 (1.050) 


0.033 (1.036) 




a=l 


a=2 


a=4 


a=l 


a=2 


a=4 


a=l 


a=2 


a=4 


BP % 


-12.1(-.74) 


-6.3(-.42) 


40.8(2.42) 


-9.3(-.43) 


-4.8(-.23) 


31.5(1.45) 


-6.3(-.21) 


-3.2(-.ll) 


21.8(.69) 


MAP % 


-6.1(-.33) 


-12.1(-.74) 


11.3(.64) 


-4.7(-.19) 


-9.3(-.43) 


8.8(.39) 


-3.2(-.10) 


-6.3(-.21) 


6.2(.19) 




S=40 


S=50 


S=100 


UP 


0.025 (1.027) 


0.020 (1.025) 


0.010 (1.015) 




a=l 


a=2 


a=4 


a=l 


a=2 


a=4 


a=l 


a=2 


a=4 


BP % 


-4.8(-.12) 


-2.4(-.05) 


16.8(.42) 


-3.9(-.08) 


-1.9(-.03) 


13.7(.28) 


-2.0(-.02) 


-l.O(-.Ol) 


6.7(.06) 


MAP % 


-2.5(-.06) 


-4.8(-.12) 


4.8(.13) 


-2.0(-.04) 


-3.9(-.08) 


4.0(.09) 


-0.9(-.01) 


-2.0(-.02) 


1.9(.02) 



In Figure 1, the L 2 -error of prediction is plotted as a function of a for 9 = 1 and S = 20. 
As expected by (15. ip . parabolic curves are obtained and BP (resp. MAP) is better than 



UP for a in the interval 



0,1 



+ 2 



(resp. 



2 - + 2 + VS~ l + 2 



. Same 

conclusions hold for other choices of h and | or 9 (see the selected results in Table [2]). Errors 
increase as h and — or 9 increase, and a good choice of the prior has a significative impact 
on the estimation error. In conclusion, at least for this simulated case, it appears that the 
L 2 -error of prediction can be quite large, especially for small values of S and|or large values 
of h and 9. 



6. Bayesian inference for the Ornstein-Uhlenbeck process 

Consider a stationary version of the Ornstein-Uhlenbeck process (O.U.) defined by X t = 
m + Jim e~ e(t ~ s ^ dW(s), t 6 R, (m 6 R, 9 > 0) where W is a standard bilateral Wiener 
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9.1, S.20, b.1, h.0.5 



6=1, S.20, b=1, h=1 



Unbias9d 
Bayes 



Unbiased 

Bayes 

MAP 



FIGURE 1. L 2 prediction error for 6 = 1 in terms of a with T(a, 1) prior : UP 
(plain horizontal), BP (dashes), MAP (dots) for S = 20. Vertical lines corresponds 
to a = 1 + VS- 1 + 2 (dashes) and a = 2 ± V5 1 " 1 + 2 (dots). On the left : h = 0.5, 
on the right : h = 1. 



TABLE 2. I? estimation (prediction) error, in the case S = 20, for UP and per- 
centage variations of I? estimation (prediction) error for BPi and MAPi, where i 
refers to a = i. 





9=0.5 


fl=5 


fl=10 




h=0.5 


h=l 


h=2 


h=0.5 


h=l 


h=2 


h=0.5 


h=l 


h=2 


UP 


.01 (.3) 


.02 (.5) 


.1 (1.1) 


.06 (2.6) 


.25 (5.3) 


1 (11) 


.12 (5.1) 


.5 (10.4) 


1.99 (22.1) 


BP1 % 


-6.9 (-.16) 


-6.9 (-.33) 


-6.9 (-.61) 


5.1 (.2) 


5.1 (.34) 


5.1 (.61) 


28 (.75) 


28 (1.48) 


28 (2.7) 


BP2 % 


11.5 (.31) 


11.5 (.55) 


11.5 (1.1) 


-1.2 (.02) 


-1.2 (.02) 


-1.2 (.0) 


20.2 (.55) 


20.2 (1.09) 


20.2 (1.98) 


BP4 % 


103.3 (2.58) 


103.3 (4.89) 


103.3 (9.52) 


-8.4 (-.19) 


-8.4 (-.37) 


-8.4 (-.73) 


7.4 (.23) 


7.4 (.45) 


7.4 (.78) 


MAPI % 


-7.1 (-.19) 


-7.1 (-.34) 


-7.1 (-.66) 


13.2 (.41) 


13.2 (.75) 


13.2 (1.39) 


36.7 (.97) 


36.7 (1.91) 


36.7 (3.5) 


MAP2 % 


-6.9 (-.16) 


-6.9 (-.33) 


-6.9 (-.61) 


5.1 (.2) 


5.1 (.34) 


5.1 (.61) 


28 (.75) 


28 (1.48) 


28 (2.7) 


MAP4 % 


48.3 (1.22) 


48.3 (2.29) 


48.3 (4.48) 


-5.7 (-.1) 


-5.7 (-.22) 


-5.7 (-.44) 


13.3 (.38) 


13.3 (.75) 


13.3 (1.34) 



process. Set X j = X t — m, t £ R, then the likelihood of X^s) = {X t , < t < S) with 
respect to X /s) — (Xo,t, < t < S) is given by 

/ f)m 2 

(6.1) L(X {s y, m, 9) = exp ( - —(2 + OS) + 9m(X + X s + 9 j X t dt) 

(cf Grenanderl . 1981 . p. 128-129) where X(g) an d X Q rg) take their values in the space 

c([o,s]), (s>o). 



6.1. Estimating m. We suppose that 9 is known and m £ R is unknown. In order to 
construct a Bayesian estimator of m and a Bayesian predictor of X$+h (h > 0) given X(s), we 
consider the random variable M with prior distribution 7V(mo, u 2 ) (u > 0), and suppose that 
M is independent from W. Using (16. ip . it follows that the posterior density of M given X($) 
B 1 

isJ\f( — ,-) where A = 9(2+9S) + ± and B = 9Z S + ^ with Z s = (X +X s + 9 f c f X t dt). 
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Hence the Bayesian estimator of m : 



m s = — 



B Z s + m e^u- 2 



A 2 + 9S + 9~ 1 u~ 2 
when the maximum likelihood estimator (MLE) is ms = 2 +es ' Consequently 
(6.2) rh s = a s m s + (1 - a s )m 

with as = (1 + 9~ 1 {2 + 9S)~ l u~ 2 )~ l g]0, 1[. Note that lim rhs = mo an d lim rhs = m s- 

Asymptotic efficiency. The MLE ms is efficient (cf iBosq and Blankej . 120071 . p. 28) and ms 
is asymptotically efficient since, from ( 16. 2p . 



E m (m s - m) 2 2 2 (mo - mf 

a s +(l- a s ) 



E m (m s - m) 2 * E m (m s - m) 2 

with a 2 s -> 1 as 5 -»■ oo, (1 - a s ) 2 = C(5" 2 ) and E m (m 5 - m) 2 = C^S" 1 ). 

Prediction. We have H m (Xs+h\X(s)) = E^Xs+hlXs) = e~ 0h (X s — m) + m. The unbiased 

predictor associated with the MLE is 

p s := p{X (s) ) = m s {\ - e~ eh ) + e - 9h X s , 
and using Proposition \4.2\ one obtains the Bayesian predictor 

p ,5 := Po(^(s)) = m s (l ~ z~ eh ) + e~ 9h Xs- 

We get 

Po,5 = asPs + (1 - a s )(m (l - e~ eh ) + e~ e/l X 5 ) = a s p s + (1 - a s )p(X s , m ). 

Concerning efficiency, again we deduce that ps is efficient and po,s is asymptotically effi- 
cient. Now, in order to compare po,s with ps, we use Lemma H~3l for obtaining the following 
result. 

Proposition 6.1. We have thatp Q)S -< £>s is equivalent to \m — m \ < ( g^+es) ~*~ 2 anc ^ 
|m — mo | < "uv^ implies po,s ~< Ps / or all S > 0. 

The proof is straightforward since one has E m (ms — m) 

2 = (Q(2 + 05)) . Of course, 
the result is strictly the same if one compares rhs with ms since % -< ms is equivalent to 
Po,s ~< Ps- 

6.2. Estimating 9. Suppose now that 9 is unknown and m is known ; one may take m — 0. 
The likelihood of with respect to W^s) has the form 

L(X (5) ) = exp ( - \{X 2 S -X 2 -S)-^ j* X 2 dt 



sec 



Liptser and Shiryaevl (12001 ). Even if 9 is positive, it is convenient to take N(0q, v 2 ) (with 
9 > and v 2 > 0) as prior distribution of T. Then, the marginal distribution of Xrg\ has 

density <p(x(s)) = 7=1 ex P ( ~ S + fi) where a = So x l ds + £ and P = 5 ~ x f +x ° + J. 
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It follows that the conditional distribution of T given X(s) is A/Y-, -), hence the bayesian 



estimator of : S = £ = * (S r f£*° )+ °° v - when the MLE is # s = ^VSt*"^ consequently 

Jo di+l '' Jo x t dt 



(6.3) ^ = 75 6*5 + (1 - 7s)#o with 75- 



and lim s = On while lim 0$ = Or. 



7 5 = "0 wmie 11111 or 

v 2 —*0 v 2 — >oo 



Concerning prediction, we have W^g(Xs+h\X(s)) = e eh ■ X s , so it is necessary to compute 



the Bayesian estimator of e . We get 



E(e"™|X (s) ) = J e^^e-^ 2 dO = ex P (-^ • h), 

hence the Bayesian predictor p (X( S )) = exp( — ■ h) ■ Xs- The predictor associated with 
the MLE is p(X(s)) — e~ 9s ' h ■ Xs and finally, an alternative form of the predictor, associated 
with the MAP, should be p{X {S) ) = e-^ h ■ X s . 

Remark : In order to compare Os with 0$, one may use an approximation of (16. 3p by 
setting 7a PP ,5 = (1 + Then, the equivalent form of Lemma H~3l for estimation, gives 

the "approximate" condition 

- Sv 2 1 I 

S -4 S <«=► \0 - O \ < (1 + — Y 2 (E e (0 s - 0f)\ 



note, however that 0s is not unbiased. 



Finally, one may consider alternative priors, as well as, the translated exponential distri- 
bution with density tp(8) = rjexp 

(-r)(0- 0o))t ]6o , +oo[ (0), (77 > 0, O > 0). If iff denotes the 
density of jV(— ^, |), with a = x s — x\ — S + 2r] and b = J Q S x\ dt, the Bayesian estimator 
is implicit and given by 0s = 0ip{0) dO I if) (8) d0. Derivation is left to the reader. 

7. Ornstein-Uhlenbeck PROCESS for sampled data 

We now consider the more realistic case where only Xq, Xg, ■ ■ ■ , X n s are observed and one 
wants to predict X( n+ h)s, {h > 0). 

7.1. Estimation of m. If is known, and m G R unknown, the associated model is 
(7.1) X nS - m = e~ e5 (X {n _ 1)S - m) + e nS , n e Z 

and 



1-e 



-268 



(7-2) Vai(e wa )= 2Q =:a{ e 

If 5 > is fixed, we deal with a classical AR(1), so we will focus on the case where 5 = S n is 
"small". One may use various condition as n — >■ 00 : 5 n — > and n5„ — > 00 or <5 n — > and 
n<5„ — > S > for example. Two approaches are possible : either considering the likelihood 
or the conditional likelihood (X is arbitrary but non random) which has a simpler form. 
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7.1.1. Unconditional estimation. Since X — m, £s n , ■ ■ ■ , £ n s n ~ A/"(0, (26)~ 1 ) ® jV(0, a 2 ^)®", 
one may deduce that (X — m,Xg — m, . . . , X n $ — m) has the density 



f(x , xi,...,i n )= (-) 2 1 x exp C - 0(z o - m) 



>A (xj - e Wn Xi_i - m(l - e e<5n ))' 
^ 2^ 



This yields 



(7.3) m n = -° 



n(l - e~ eSn ) + 1 + e 
for the MLE, while if M ~ Af(m , u 2 ), one has 

e_y 1 



L(X , . . . , X n5rt , M) = ( - ) 5 = x exp ( - 8(X 

V7T/ n-r/,A/97T V 





" (X l5n - P s,eX^ 1)Sn - M(l - e- w "))% 1 . 1 « 
— > — ^— = — x — — exp -(M — m ) 

giving 

r?4 , ~ _ X + X n , n + (1 - e-^Q EL'i 1 + (1 + e- flf ")^r 

1 ' 1 n ~ n(l- e-«*n) + (1 + e~^)(l + 2S?) 

Again, we have m n = a n m n + (1 — a„)m with 

n(l - e~ e5 ™) + 1 + e~ e<5n 

a, 



n n{\ - e~ es ») + (1 + e- 05 «)(l + (2^ 2 )^ 1 ) ' 

Since EfX^+^jX^J = e" 9/l<5 " (X n(5n - m) + m, the derived predictors of X( n+h ) Sn , h > 1 
are given by p n (X n5n ) = m n (l - e~ ehSn ) + e- 0M "X re<5n while p ,n(X nSn ) = rh n (l - e ^ eh5n ) + 
e~ 9hSn X n s n , and Lemma [4.31 implies that 

P0,n ~< Pn 

. . 2 ^ 2n(l - e- 95 «) + (1 + e~^)(2 + (2^ 2 )^ 1 ) _ . . 2 

(m " mo) " (l + e^)(2^ 2 )- 1 X m(mn " ^ 

Next, easy but tedious computation gives E m (m n — m) 2 = — -, 1+e yielding the 

26\ K n(l-e- es n)+l+e- es n) 

equivalence : po n -< p n (m — m ) 2 < 2u 2 H ? — — r . Asymptotically, we get 

26>(^l+e- 6, ' 5 n +n (i_ e -e^n)j 

if (5 n > 0, n<5 n y S > 0, p ,n -<n^oo is equivalent to (m - m ) 2 < 2u 2 + wrzhjA - 

The condition S — > oo implying in turn the equivalence po tn -< n ^oo Pn (m — mo) 2 < 2u 2 , 
which are the same results as in the continuous case (cf Proposition 16. ip . If n5 n — > S > 0, 
note that our estimators of m are no more consistent ! But still in this case, a good choice 
of the prior should allow reductions of risks of estimation and prediction. 
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7.1.2. Conditional likelihood. In this part, we use conditional likelihood on X , and choosing 
M ~ M{m Q , u 2 ), (u > 0), we obtain the "density" of (X Sn , . . . , X nSn , M) : 

1 / 1 n 

L{X Sn , . . . , X n s n , M) = 7 =- exp ( - —g- ^ ((X l5n - p 5 X ( ,„ 1)(5 J 

(0-5,0 V 27r ) V ~TT v 

+ M(m - 1) ) 2 ) x ^7S exp (-i (M - m ») 2 )' 

where := p(9,5 n ) = exp(—95 n ) and erf^ is defined in (17.21) . Since we are in the Gaussian 
case, the conditional mode and the conditional expectation coincide, now : 



In L = c 



1 n 



m ) 2 



where c does not depend on n. It follows that the Bayesian estimator is now given by 

2 

f7 k\ ~ (1 ~ gg) Er=i( x ^n ~ ggjfy-ijg) + m ^f 

[i.o) m n — —- 2 , 

(l-ps) 2 n+^ 

while the conditional MLE takes the form 

(7.6) m„ = r — * — '—, 

(1 - p s )n 

We may slightly modify the estimator (I7.5P for obtaining 

(7.7) m n = f3 n X n + (1 - /3 n )m 

with X n = n" 1 Yh=i x iS n and fin = — — 2 ■ Hence 

(l- P(5 ) 2 + ^4 

l + /3 n . _ 4fa 2 n(l-e'^") + l + e'^" 
1 -/3 n ~ 1 + e~ es « 

and, since Var (X n ) = 2ne(i-e- 0S n p > asymptotically we get that, if 5 n — > 0, n£„ — > 

S > 0, 



m n -< n ^oo X n <^ (m - m ) < 



K l + 2u 2 9 2 S)(9S- l + e~ es ) 



03 S 2 



while if 5 n — > 0,n5 n — > oo, we get the equivalence : m n -< n ^oo X n <^ (m — m Q ) 2 < 2w 2 . 
Again, the same results are obtained for predictors. 



7.2. Estimation of p. In the case where m is known (one may set m = 0), we now choose 
N(p ,v 2 ) as a prior for p = e~ 9Sn , with < po < 1 an d i> > 0. Note that this prior 
is reasonable as soon as p is n °t too far from 1 and v not too large. Using again the 
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TABLE 3. L 2 -prediction error (m unknown) for MLE predictor and percentage 
variation of L 2 -prediction error for others in the case where 6 = 1, H = 1, u 2 = 1 
and 5 = 0.1. 





n=15 


n=30 


n=50 


n=100 


MLE 


0.548 


0.499 


0.488 


0.464 


Mean (%) 


2.76 


2.82 


1.61 


0.35 


CMLE (%) 


27.97 


10.54 


5.62 


1.35 




a=4 


a=5 


a=7 


a=4 


a=5 


a=7 


a=4 


a=5 


a=7 


a=4 


a=5 


a=7 


Bay (%) 


-4.87 


-8.52 


5.77 


-2.79 


-4.72 


4.78 


-1.30 


-2.63 


2.39 


-0.62 


-1.08 


1.08 


CMAP2 (%) 


-1.04 


-12.86 


33.54 


-.53 


-5.02 


16.05 


.03 


-2.28 


6.77 


-.35 


-0.99 


2.00 



conditional likelihood, one obtains the expression : 

~ 1 1 n 

L(X Sn , . . . , X n5n ,p) = -==— exp ( - V (X iSn - pX {i _ l)&n \ 

(<76,8V 27T) n — 



i=l 

1 



X 



V^K ' v 2v 2 

Since aj s depends on p, we make the approximation as t e ~ $ for obtaining the posterior 

1 n 1 1 n 

distribution Af(%,\) where A = T-^2 X i-Wn + — and S = — ^ X {i ^ 1)Sn X iSn + -|, 

On ■ i V n . V 

i=l i=l 

hence the "Bayesian" estimator takes the form 

En y V -L Po ^ n 

i=l ■^■(i-l)S n - /v iS„ ~T v 2 

P n ~ — v n TX • 

2^i=\ ^(i-i)s n v 2 

Comparison with the conditional MLE 

In q\ n — ^2i=l X(i-l)5 n XjS n 

l^i=\ A {i-l)5 n 

is rather intricate and will be illustrated numerically in the next section. 

7.3. Simulation. For 9 G {0.5,1,2}, m = 5, various sample sizes n and values of 5, 5000 
replications of Ornstein-Uhlenbeck sample paths are computed from the autoregressive re- 
lation (17. ip . First, for known 9 but m unknown, we compare various predictors of X n $ + H, 
H = h5 and H = 0.5, 1 or 2, defined by m(l — e~ ehS ) + e~ 9hS X n s where m refers to estimators 
which are either : 

• non bayesian : MLE with m n defined in (17.31) . Mean X n , CMLE with m n defined in 

ra, ' 

• or bayesian : Bayes with m n defined in (I7.4p . CMAP1 with m n defined in f lT.5 j) 
(u 2 = 1) and CMAP2 with m n defined in (EB {u 2 = 1). 

Among all non bayesian estimators and in all cases, it emerges that MLE outperforms 
the other two, with a very poor behaviour of the CMLE toward the others, a fact already 
noticed bv lCoxl fll99lh . For this reason, our results below do not report the obtained values for 
CMAP1, because of its too high sensitivity toward CMLE. In Tabled we give the rounded 
empirical L 2 -prediction error of the MLE, and for comparison, the percentage variations 
observed for the others predictors for 9 = 1 and 5 = 0.1. It appears that all errors decrease 
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MLE 

GMLE 

Mean 



MLE 
GMLE 

Mgan 



FIGURE 2. L 2 -prediction error for m unknown (m = 5) and 9 = 1 (known), 5 = 
0.1 in terms of mo with J\f(mo,l) prior : MLE (plain), CMLE (twodash), Mean 
(dashed), Bayes (longdash), CMAP1 (dotted), CMAP2 (dotdash) when u 2 = 1. 
Vertical lines corresponds to mo = 5±V2u 2 . On the left : n = 30 (S = 3, S + h* 5 = 
4), on the right : n = 100 (5 = 10, S + h * 5 = 11). 



as n increases, and Bayes predictors are highly competitive for small sample sizes and good 
choice of priors, namely M ~ A/"(m , 1), with m G 5- ^2 + (S + 2)~\ 5+ ^2 + (S + 2)- 

or asymptotically, S = nS — > oo, m E 5 — y2, 5 + y/2 , see Section 17.1.11 By this way, 

errors are significantly reduced for a = 4 or 5 and n less than 50, while a bad choice like 
mo = 7 damages them dramatically. It appears also that CMAP2 has the smallest errors but 
only on a small area around m, the Bayesian predictor (with rh n denned in f)7.4p ) being more 
robust against the choice of mo- These results are confirmed in the Figure 2 where errors 
are given in term of m : as expected, we obtain parabolic curves for bayesian predictors. 
Again, the Bayesian setting improves the errors for good choices of prior (especially for small 
values of 5 and nS where MLE is not so good) and otherwise deteriorate it. 



TABLE 4. L 2 -prediction error (m unknown) for MLE predictor and percentage 
variation of L 2 -prediction error for others in the case where 9 = 1, H = 1, u = 1 
and mo £ {4, 5, 7}. 



n 


10 


20 


50 


100 


S 


0.1 


0.2 


0.5 


0.1 


0.2 


0.5 


0.1 


0.2 


0.5 


0.1 


0.2 


0.5 


MLE 


.586 


.531 


.488 


.531 


.498 


.464 


.488 


.464 


.441 


.464 


.461 


.444 


Mean (%) 


2.77 


2.46 


1.92 


2.40 


2.13 


.44 


1.61 


.36 


.16 


.35 


.16 


-.06 


CMLE (%) 


47.01 


17.69 


5.73 


17.56 


7.38 


1.40 


5.62 


1.36 


0.24 


1.35 


.47 


-.05 


Bay_4 (%) 


-5.06 


-4.23 


-1.31 


-4.22 


-1.84 


-.65 


-1.30 


-.62 


-.03 


-.62 


-.31 


-.04 


Bay_5 (%) 


-10.48 


-6.66 


-2.68 


-6.65 


-3.41 


-1.11 


-2.63 


-1.08 


-0.23 


-1.08 


-0.37 


-0.05 


Bay_7(%) 


4.26 


6.58 


2.47 


6.57 


3.32 


1.11 


2.39 


1.08 


0.11 


1.08 


0.49 


0.12 


CMAP2.4 (%) 


1.89 


-1.70 


.26 


-1.70 


-.13 


-.30 


.03 


-.34 


.13 


-.35 


-.18 


-.10 


CMAP2.5 (%) 


-17.35 


-9.23 


-2.13 


-9.23 


-3.20 


-.96 


-2.28 


-.98 


-.09 


-.99 


-.26 


-.12 


CMAP2.7 (%) 


46.63 


26.12 


7.21 


25.96 


10.04 


2.15 


6.77 


2.02 


.31 


2.00 


.75 


.07 
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In Table HI we compare the obtained errors for varying values of 5, while in Table |5] the 
influence of 9 is measured. First it appears, that obtained errors depend only on S = nS, 
and not on the individual values of n and 6 (see the bold errors). It is not a surprise 
since examination of L 2 -risks shows that leading terms are of order n5 for each estimators. 
Moreover errors are much larger as 5 and — or 9 are small. Again, it agrees with our theoretical 
framework since more observations are correlated (more 5 is small), more covariances are 
important and the overall risk is degraded. Also, weak values of 9 corresponds to variables 
with high variance since Var(Xi) = (20)" 1 , and prediction is more difficult in this case. 
Finally, errors are represented in term of n in Figure 3 (left) : not surprisingly, errors 
decrease and estimators are asymptotically equivalent. 

TABLE 5. L 2 -prediction error (m unknown) for MLE predictor and percentage 
variation of L 2 -prediction error for other in the case where n = 20, 5 = 0. 1, u 2 = 1, 
and mo € {4, 5, 7}. 



e 


0.5 


1 


2 


H 


0.5 


1 


2 


0.5 


1 


2 


0.5 


1 


2 


MLE 


0.421 


0.728 


1.138 


.34 


.531 


.677 


.249 


.303 


.333 


Mean (%) 


0.51 


1.32 


2.23 


1.81 


2.4 


4.07 


1.88 


2.72 


2.85 


CMLE (%) 


15.03 


27.59 


48.31 


10.76 


17.56 


27.37 


6.04 


9.76 


10.65 


Bay_4 (%) 


-3.41 


-5.9 


-9.32 


-2.72 


-4.22 


-6.09 


-1.04 


-0.92 


-2.15 


Bay_5 (%) 


-5.27 


-9.31 


-15.56 


-4.11 


-6.65 


-9.82 


-1.99 


-2.97 


-3.58 


Bay_7(%) 


2.18 


4.31 


5.68 


4.04 


6.57 


9.24 


1.82 


1.7 


3.87 


CMAP2.4 (%) 


-2.2 


-3.33 


-4.77 


-1.07 


-1.7 


-1.96 


0.67 


1.85 


0.33 


CMAP2.5 (%) 


-7.41 


-12.86 


-21.55 


-5.5 


-9.23 


-13.28 


-1.35 


-2.1 


-2.9 


CMAP2.7 (%) 


13.2 


24.79 


38.55 


16.06 


25.96 


37.82 


6.56 


8.36 


12.21 




FIGURE 3. On the left : L 2 prediction error for unknown m, with prior A/"(5, 1), 
known 9 (9 = 1) in terms of n when 6 = 0.1 :MLE (plain), CMLE (twodash), Mean 
(dashed), Bayes (longdash), CMAP1 (dotted), CMAP2 (dotdash). On the right : 
L 2 prediction error for unknown p, prior M(po, 10 -2 ) in terms of n when 5 = 0.1 : 
CMLE (plain), CBayes with p = 0.9 (dashed), CBayes (dotted) with p = 0.83. 
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TABLE 6. L 2 -prediction error (9 unknown) for MLE predictor and percentage vari- 
ation of L 2 -prediction error for others in the case where H = 1, 5 = 0.1, v 2 = 0.01, 
and po € {0.5,0.75,0.85,0.9}. 



e 

n= 20~~~~\^ 


0.5 


1 


2 


CMLE 


0.83 


0.503 


0.266 


PO 


0.5 


0.75 


0.85 


0.9 


0.5 


0.75 


0.85 


0.9 


0.5 


0.75 


0.85 


0.9 


"Bayes" (%) 


-8.41 


-14.67 


-14.90 


-13.55 


-7.25 


-12.05 


-12.66 


-11.48 


-5.82 


-6.56 


-5.42 


-3.21 


e 

n= ioir-\^ 


0.5 


1 


2 


CMLE 


0.679 


0.444 


0.242 


PO 


0.5 


0.75 


0.85 


0.9 


0.5 


0.75 


0.85 


0.9 


0.5 


0.75 


0.85 


0.9 


"Bayes" (%) 


3.76 


-0.25 


-1.04 


-1.15 


0.89 


-1.13 


-1.22 


-0.95 


-0.44 


-0.77 


-0.46 


-0.03 



Concerning prediction when 9 is unknown [m known), we have computed the two predic- 
tors derived from the estimators given by (17. 7p (CMLE) and (17.81) ("Bayes"). The Figure 3 
(right) that errors decrease with n and Bayes predictors are much better for small values of 
n. A noteworthy result is that errors are significantly improved for any choice of prior, at 
least for n small : see Table [6] for n = 20 and Figure 4 (left) for n = 30. This last conclusion 
may be tempered by the possibly bad behaviour of the CMLE in this framework. The Bayes 
predictor is more sensitive to the prior for n = 100 (and po larger than 0.8). 
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FIGURE 4. L 2 prediction error for p = exp(— 95) unknown (9 = 1 and 5 = 0.1) and 
m = 5 (known) in terms of po with N(po, 1) prior : CMLE (plain horizontal), Bayes 
predictor (dashed) when v 2 = 10 -2 . On the left : n = 30 (S = 3, S + h * 5 = 4), on 
the right : n = 100 (S = 10, S + h * S = 11). 
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